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Abstract 

In this paper, we show that, due to the structural properties of the re- 
sulting automaton obtained from a prior operation, the state complexity 
of a combined operation may not be equal but close to the mathematical 
composition of the state complexities of its component operations. In par- 
ticular, we provide two witness combined operations: reversal combined 
with catenation and star combined with catenation. 



1 Introduction 

State complexity is a type of dcscriptional complexity based on deterministic 
finite automaton (DFA) model. The state complexity of an operation on regular 
languages is the number of states that are necessary and sufficient in the worst 
case for the minimal, complete DFA that accepts the resulting language of the 
operation. While many results on the state complexities of individual opera- 
tions, such as union, intersection, catenation, star, reversal, shuffle, orthogonal 
catenation, proportional removal, and cyclic shift [21 [5J [HI H21 HH HH E3 HU [221 
HZ5] . have been obtained in the past 15 years, the research of state complexities 
of combined operations, which was initiated by A. Salomaa, K. Salomaa, and 
S. Yu in 2007 [20] . is attracting more attention. This is because, in practice, 
a combination of several individual operations, rather than only one individual 
operation, is often performed in a certain order. For example, in order to ob- 
tain a precise regular expression, a combination of basic operations is usually 
required. 

In recent publications U H M EH ED ED El W\ , it has been shown that 
the state complexity of a combined operation is not always a simple mathemat- 
ical composition of the state complexities of its component operations. This is 
sometimes due to the structural properties of the DFA accepting the resulting 
language obtained from a prior operation of a combined operation. For exam- 
ple, the languages that are obtained from performing reversal and reach the 
upper bound of the state complexity of this operation are accepted by DFAs 
such that half of their states are final; and the initial state of the DFA accepting 
a language obtained after performing star is always a final state. As a result, 
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the resulting language obtained from a prior operation may not be among the 
worst cases of the subsequent operation. Since such issues are not concerned 
by the study of the state complexity of individual operations, they are certainly 
important in the research of the state complexity of combined operations. Al- 
though the number of combined operations is unlimited and it is impossible to 
study the state complexities of all of them, the study on combinations of two 
individual operations is clearly necessary. 

In this paper, we study the state complexities of reversal combined with cate- 
nation, i.e., L(A) R L(B), and star combined with catenation, i.e., L(A)*L(B), 
for minimal complete DFAs A and B of sizes m,n > 1, respectively. For 
L(A) R L(B), we will show that the general upper bound |2 m+n , which is close to 
the composition of the state complexities of reversal and catenation 2 m+n — 2™~ 1 , 
is reachable when m,n > 2, and it can be lower to 2 ra_1 and 2 m_1 + 1 when 
m = 1 and n > 1 and when m > 2 and n = 1, respectively. For L(A)* L(B), 
we will show that, if A has only one final state and it is also the initial state, 
i.e., L(A) — L(A)* , the state complexity of catenation (also L(A)*L(B)) is 
m(2 n — 1) — 2™ _1 + 1, which is lower than that of catenation m2 n — 2 n ~ 1 . In the 
other cases, that is when A contains some final states that are not the initial 
state, the state complexity of L(A)*L(B) is 5 ■ 2 m+n ' 3 - 2 m - 1 - 2" + 1 instead of 
|2 m+ ™ — 2™~ 1 , the composition of the state complexities of star and catenation. 

In the next section, we introduce the basic definitions and notations used in 
the paper. Then, we prove our results on reversal combined with catenation and 
star combined with catenation in Sections [3] and [4] respectively. We conclude 
the paper in Section [5j 

2 Preliminaries 

A DFA is denoted by a 5-tuple A — (Q,H,6, s, F), where Q is the finite set of 
states, S is the finite input alphabet, S : Q x E — > Q is the state transition 
function, s € Q is the initial state, and F C Q is the set of final states. A DFA 
is said to be complete if S(q,a) is defined for all q 6 Q and a 6 E. All the 
DFAs we mention in this paper are assumed to be complete. We extend S to 
Q x S* — » Q in the usual way. 

A non- deterministic finite automaton (NFA) is denoted by a 5-tuple A = 
(Q, S, S, s, F), where the definitions of Q, E, s, and F are the same to those of 
DFAs, but the state transition function S is defined as 5 : Q x E — > 2*3, where 
2^ denotes the power set of Q, i.e. the set of all subsets of Q. 

In this paper, the state transition function 6 is often extended to 5 : 2® x E — > 
2*2. The function 6 is defined by S(R,a) = {S(r,a) | r £ R}, for R C Q and 
a £ E. We just write S instead of 5 if there is no confusion. 

A word w G E* is accepted by a finite automaton if S(s,w) f) F ^ 0. Two 
states in a finite automaton A are said to be equivalent if and only if for every 
word w £ E*, if A is started in either state with w as input, it either accepts 
in both cases or rejects in both cases. It is well-known that a language which 
is accepted by an NFA can be accepted by a DFA, and such a language is said 



2 



to be regular. The language accepted by a DFA A is denoted by L(A). The 
reader may refer to [13j [24] for more details about regular languages and finite 
automata. 

The state complexity of a regular language L, denoted by sc(L), is the number 
of states of the minimal complete DFA that accepts L. The state complexity 
of a class S of regular languages, denoted by sc(S), is the supremum among all 
sc(L), L £ S. The state complexity of an operation on regular languages is the 
state complexity of the resulting languages from the operation as a function of 
the state complexity of the operand languages. Thus, in a certain sense, the 
state complexity of an operation is a worst-case complexity. 

3 Reversal combined with catenation 

In this section, we study the state complexity of L2 for an m-state DFA 
language L\ and an n-state DFA language L<2. We first show that the state 
complexity of Lf L2 is upper bounded by |2 m+n in general (Theorem [T]). Then 
we prove that this upper bound can be reached when m,n > 2 (Theorem [2]) . 
Next, we investigate the case when m = 1 and n > 1 and prove the state com- 
plexity can be lower to 2 n ~ 1 in such a case (Theorem [4]). Finally, we show that 
the state complexity of Lf L2 is 2 m_1 + 1 when m > 2 and n = 1 (Theorem [7]). 

Now, we start with a general upper bound of state complexity of Lf L2 for 
any integers m,n > 1. 

Theorem 1. For two integers m,n> I, let L\ and L2 be two regular languages 
accepted by an m-state DFA and an n-state DFA, respectively. Then there exists 
a DFA of at most |2 m+ ™ states that accepts Lf L 2 . 

Proof. Let M = (Qm, S, Sm, sm, Fm) be a DFA of m states, k\ final states and 
L\ = L(M). Let N = (Qn,H,6n,sn,Fn) be another DFA of n states and 
L 2 = L(N). 

Let M' = (Qm, S, 5m' , Fm, {sm}) be an NFA with k% initial states. 5m< {p, a) = 
q if 6m(q, a) =p where a G S and p, q £ Qm- Clearly, 

L(M') = L(M) R = if. 

By performing subset construction on NFA M', we can get an equivalent, 
2 m -state DFA A = (Q A ,Y,,5 A ,s A ,F A ) such that L(A) = Lf. Since M' has 
only one final state sm, we know that F A — {i | i C Qm, sm £ *}• Thus, A has 
2?n-i g na j states in total. Now we construct a DFA B = (Qb,^,5b,sb,Fb) 
accepting the language LiL 2 , where 

Qb = {{i,j)\ieQ A ,jCQ N }, 

sb = (s A ,9), if s A <£ F A ] 

— {s A ,{sn}), otherwise, 
Fb = £Q B \jnF N ^0}, 

^B((i,j),a) = (i',f), if 6 A (i,a) = i' , S N (j,a) = j' , a e S, i' £ F A ; 

= (i',f U {sn}), if 5 A (i,a) = i', <5jv(j, a) = f , a e E, i' e F A . 
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From the above construction, we can see that all the states in B starting with 
i G Fa must end with j such that sn & j- There are in total 2 m ~ 1 • 2"~ 1 states 
which don't meet this. 

Thus, the number of states of the minimal DFA accepting L^L-z is no more 
than 

2>ri • n 2 m ~ • 2 n ~^ _2 m ~'~ n 

- ^ 

□ 

This result gives an upper bound for the state complexity of L^L^. Next we 
show that this bound is reachable when m,n>2. 

Theorem 2. Given two integers m,n> 2, there exists a DFA M of m states 
and a DFA N of n states such that any DFA accepting L(M) R L(N) needs at 
least |2 m +" states. 

Proof. Let M = (Qm, S, $m, 0, {to - 1}) be a DFA, shown in Figured! where 
Qm = {0, 1, . . . , m — 1}, £ = {a, b, c, d}, and the transitions are given as: 

• <5m(*, a) = i + 1 mod m, i = 0, . . . , m — 1, 

• 6) = i, i = 0, . . . , m — 2, <Jm(w — 1, 6) = m — 2, 

• #M-(m — 2, c) = m — 1, <^m(™ — 1, e) = m — 2, 
if to > 3, <5m(*, c) = i, i = 0, . . . , to — 3, 

• <5m (ij d) = i, i = 0, . . . , m — 1, 



6, c, d 6, c, d 




d M 



Figure 1: Witness DFA M of Theorem [2] showing that the upper bound in 
Theorem [1] is reachable when m,n > 2 

Let N = (Qjy, S, 5at, 0, {n — 1}) be a DFA, shown in Figure [21 where Qn = 
{0, 1, . . . , n — 1}, S = {a, 6, c, cZ}, and the transitions are given as: 
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• 5ff(i, a) = i, % = 1, . . . , n — 1, 

• <5jv(i, b) = i, i = 1, . . . , n — 1, 

• 5iv(i, c) = 0, i = 1, . . . , n — 1, 

• <5iv(i, d) = i + 1 mod n, i = 0, . . . , n — 1, 



a,b a,b 




a, b 



Figure 2: Witness DFA N of Theorem [2] showing that the upper bound in 
Theorem [T] is reachable when m,n>2 

Now we design a DFA >1 = (Qa, E, <5a, {to — 1}, -Fa), where Qa = {q\ q (= Qm}, 
E = {a, 6, c, <i}, Fa = {q | G q, g € Qa}, and the transitions are defined as: 

(5a(p, e) = {j | S M {j, e) = i, i G p}, p £ Qa, e G E. 

It is easy to see that A is a DFA that accepts L(M) R . We prove that A is 
minimal before using it. 

(I) We first show that every state 7 G Qa, is reachable from {m — 1}. There 
are three cases. 

1. |7| = 0. |7| = if and only if I = 0. 5 A {{m — l},b) = I = 0. 

2. |7| = 1. Let I = {i}, < i < to-1. <^({m- lj.a 7 "- 1 -*) = 7. 

3. 2 < |7| < m. Let 7 = {i 1: i 2 , ■ ■ ■ , ik}, < i\ < i% < . . . < i/. < m — 1, 
2 < k < m. ^({m — l},w) — 7, where 

w = abiacY^^abiacY 3 -' 2 - 1 ■ ■ ■ abiacY"' 1 "- 1 ' 1 ^ 1 - 1 -^ . 

(II) Any two different states 7 and J in Qa are distinguishable. 
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Without loss of generality, we may assume that \I\ > \J\. Let x G I — J. 
Then a string a x can distinguish these two states because 

6 A (I,a x ) G F A , 
6 A (J,a x ) i F A . 

Due to (I) and (II), A is a minimal DFA with 2 m states which accepts 
L(M) R . Now let B = (Q b ,Z 7 S b ,sb,F a } be another DFA, where 

Qb = {(p,q)\p^QA-F A ,qCQ N } 

U{(p',q') \p'&F A , q' CQ N , G <?'}, 
£ = {a, 6, c, e?}, 
ss - ({m-l},0>, 

Fb = {(P,q) \ n - 1 E q, (p,q) e Q B }, 
and for each state (p, q) G Qb and each letter e G S, 



$B((p,q),e) 



{p', q') if <5 A (p, e)=p' i F A , 5 N {q, e) = g', 

(p', q') if J A (p, e) = p' G F A , M<Z, e) = r', = r' U {0}. 



As we mentioned in last proof, all the states starting with p e F A must end 
with q C Q N such that G q. Clearly, B accepts the language L(M) R L{N) 
and it has 

2??i 2 n 2 m— 2 n ~ _2^ri+^ 

~~ 4 

states. Now we show that B is a minimal DFA. 

(I) Every state (p, q) G Qs is reachable. We consider the following five cases: 

1. p = 0, q = 0. (0, 0) is the sink state of B. 8 B ({{m - 1}, 0), b) = (p, q). 

2. p ^ 0, <? = 0. Let p = {pi,p 2 , • • • ,Pk}, 1 < Pi < Pi < ■ ■ ■ < Pk < m - 1, 
1 < k < m — 1. Note that ^ p, because G p guarantees G q. 
SB(({m - 1}, 0), w) = (p, q), where 

w = ab(ac) P2 - pi - 1 ab(ac) P3 - p2 - 1 ■ ■ ■ ab(ac) p «- p «- 1 - 1 a m - 1 - pk . 

Please note that w = a m ~ 1 ~ Pl when k = 1. 

3. p = 0, q 0. In this case, let q = {qi,q 2 , • • • , <7;}, < qi < q 2 < . ■ . < 
qi < n — 1, 1 < I < n. 5b({{iti — 1}, 9},x) = {p, q), where 

x = a m d qi - qi - 1 a m d qi -^- qi - i ■■■a m d q2 ~ qi a m d qi b. 

4. p ± 0, i p, q ^ 0. Let p = {p u p 2 , ■ ■ ■ ,Pk}, 1 < Pi < Pi < ■ ■ ■ < Pk < 
m-1, 1 < fc < m - 1 and q = {qx,q 2 , ■ ■ .,<?;}, < q x < q 2 < ■ ■ ■ < qi < 
n — 1, 1 < I < n. We can find a string uv such that <5s(({ra — 1}, 0), uv) — 
(p, q) , where 

u = ab(ac) P2 - pi - 1 ab{ac) P3 - p2 - 1 ■ ■ ■ ab(ac) Pk - pk - 1 - 1 a m - 1 - pk , 
v = a m d q >- q '- 1 a m d q '- 1 - q '- 2 ■ ■ ■ a m d q2 - qi a m d qi . 
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5. p ^ 0, G p, m - 1 <£ p, q ^ 0. Let p = {pi,p 2 , ■ ■ ■ ,Pk}, = pi < 
P2 < . . . < Pk < m — 1, 1 < k < m — 1 and q = {qi,q 2 , ■ ■ ■ ,qi}, = 
?i < qi < ■ ■ ■ < qi < n — 1, 1 < I < n. Since is in p, according to the 
definition of B, has to be in q as well. There exists a string u'v' such 
that <5b(({to — 1}, 0), u'v') = (p, q), where 



u' = ab{acY 2 - Vl - l ab{ac) P:i - p2 - 1 ■ ■ ■ ab{ac) Pk 



v' = a m d qi - qi - 1 a rn d qi - 1 - qi - 2 ■■■a m d q2 - qi a m d qi a. 

6. p 0, {0,to - f} C p, q ^ 0. Let p = {p!,p 2 , ■ ■ ■ ,Pk}, = pi < p 2 < 
...<p k =m-l,2<k<m and q = {qi,q 2 , ■ ■ .,<?/}, = q x < q 2 < . ■ . < 
qi < n — 1, 1 < I < n. In this case, we have 



(p, q) 



S B (({0, l,p 2 + 1, • ■ -,Pk-i + 1}, q),a), if m - 2 g>, 
£b«P - { TO - 1}, q), 6), ifm — 2ep, 



where states {{0, l,p 2 + 1, ■ ■ ■ ,Pfe-i + 1}, q) and (p — {to — 1}, q) have been 
proved to be reachable in Case 5. 

(II) We then show that any two different states (pi,?i) and (p 2 ,q 2 ) in Qb 
are distinguishable. 

I. qi =/= q 2 . Without loss of generality, we may assume that \qi\ > \q 2 \. Let 
x G q\ — q 2 . A string d n ~ l ~ x can distinguish them because 

fe((p 2 ,g 2 ),d"- 1 - a; ) £ F B . 



2. p! ^ p 2 , gi = <?2- Without loss of generality, we assume that |pi| > |p 2 | 
Let y G pi — P2- Then there always exists a string a y c 2 d n such that 

fe(( Pl , 9l ),^c 2 d") G F B , 



Since all the states in B are reachable and pairwise distinguishable, DFA B is 
minimal. Thus, any DFA accepting L(M)) R L(N) needs at least |2 m +" states. 

□ 

This result gives a lower bound for the state complexity of L^L 2 when 
m,n>2. It coincides with the upper bound shown in Theorem [1] exactly. Thus, 
we obtain the state complexity of the combined operation L^L 2 for m > 2 and 
n > 2. 

Theorem 3. For any integers m, n > 2, let L\ be an m-state DFA language 
and L 2 be an n- state DFA language. Then |2 m +" states are both necessary and 
sufficient in the worst case for a DFA to accept L^L 2 ■ 
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In the rest of this section, we study the remaining cases when either m = 1 
or n = 1. 

We first consider the case when m = 1 and n > 2. In this case, L\ = 
or Li = E*. LfL 2 — L\L 2 holds no matter Li is or E*, since 0^ = and 
(E*)^ = E*. It has been shown in [53] that 2 n ~ 1 states are both sufficient and 
necessary in the worst case for a DFA to accept the catenation of a 1-state DFA 
language and an n-state DFA language, n > 2. 

When to = 1 and re = 1, it is also easy to sec that 1 state is sufficient and 
necessary in the worst case for a DFA to accept L^L 2 , because L^L 2 is either 
or E*. Thus, we have the following theorem concerning the state complexity 
of L^L 2 for m = 1 and n > 1. 

Theorem 4. Let L\ be a 1-state DFA language and L 2 be an n-state DFA 
language, n > 1. Then 2 n ~ 1 states are both sufficient and necessary in the 
worst case for a DFA to accept Lf L 2 . 

Now, we study the state complexity of LfL 2 for to > 2 and n = 1. Let us 
start with the following upper bound. 

Theorem 5. For any integer m > 2, let L\ and L 2 be two regular languages 
accepted by an m-state DFA and a 1-state DFA, respectively. Then there exists 
a DFA of at most 2 m ~ 1 + 1 states that accepts L^L 2 . 

Proof. Let M — (Qm,^,Sm,sm,Fm) be a DFA of m states, m > 2, k\ final 
states and L\ = L(M). Let N be another DFA of 1 state and L 2 = L(N). Since 
A is a complete DFA, as we mentioned before, L(N) is either or E*. Clearly, 
if • = 0. Thus, we need to consider only the case L 2 = L(N) = E*. 

We construct an NFA M' = (Qm, E, Sm' , Fm, { s m}) with k\ initial states 
which is similar to the proof of Theorem [TJ Sm' {p, a) = q if Sm (q, a) = p where 
a G E and p, q G Qm- It is easy to see that 

L(M') = L{M) R = Lf . 

By performing subset construction on NFA M' , we get an equivalent, 2 m - 
state DFA A = (Q A ,Y,,6 A , s A , F A ) such that L(A) = if. F A = {i \ i C 
Qm, sm G i} because M' has only one final state sm- Thus, A has 2 m_1 final 
states in total. 

Define B = (Q B , E, S B ,s B , {/s}) where f B £ Qa, Q B = (Qa - F A ) U {f B }, 

s _ f s A if s A £ F A , 
' B \ f B otherwise. 

and for any a G E and p G Q B , 

!8 A {p,a) if 5 A (p,a) £ F A , 
f B if 5 A (p,a) G F A , 

fB if p = f B . 

The automaton B is exactly the same as A except that A's 2 m ~ 1 final states 
are made to be sink states and these sink, final states are merged into one, 
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since they are equivalent. When the computation reaches the final state /s, it 
remains there. Now, it is clear that B has 

2 m _ 2 m-l + 1 = 2 m-l + X 

states and = if E*. □ 

This theorem shows an upper bound for the state complexity of Lf L2 for 
to > 2 and n = 1. Next we prove that this upper bound is reachable. 

Lemma 1. Given an integer to = 2 or 3, t/iere exists an m-state DFA M 
and a 1-state DFA N such that any DFA accepting L(M) R L(N) needs at least 
2™- 1 + 1 states. 

Proof. When m = 2 and n = 1. We can construct the following witness DFAs. 
Let M = ({0, 1}, E, 5m, 0, {1}) be a DFA, where E = {a, 6}, and the transitions 
are given as: 

• <5 M (0, a) = 1, <5 M (l,a) = 0, 

• 5 M (0,6) = 0, 5 M (1,6) = 0. 

Let N be the DFA accepting E*. Then the resulting DFA for L(M) fl E* is 
A= ({0,1,2}, E, <J A ,0, {1}) where 

• M0, a) = 1, Ml, a) = 1, M2, a) = 2, 

• M0, 6) = 2, ML 6) - 1, M2,6) -2. 

When m = 3 and n = 1. The witness DFAs are as follows. Let M' = 
({0, 1, 2}, E', 5m>, 0, {2}) be a DFA, where E' = {a, 6, c}, and the transitions 
are: 

• S M '(0, a) = 1, £ M '(l,a) = 2, 5 M /(2,a) = 0, 

• 5 M ,(0,b) = 0, <5 M '(1,6) =0, 6 M ,(2,b) = 1, 

• 5 M '(0,c) = 0, <y M ,(l,c) = 2, <5 M '(2,c) = 1. 

Let TV' be the DFA accepting E'*. The resulting DFA for L(M') fl E'* is A' = 
({0, 1, 2, 3, 4}, E', 5 A ' , 0, {3}) where 

• M (0, a) = 1, M (1, a) = 3, M (2, a) - 2, M (3, a) - 3, 5 A , (4, a) = 3, 

• M(0,6) = 2, M(l,6) = 4, M(2,6) = 2, M(3,6) = 3, M(4,6) = 4, 

• M (0, c) = 1, M (1, c) = 0, M (2, c) - 2, M (3, c) - 3, M (4, c) = 4. 

□ 

The above result shows that the bound 2 m_1 + 1 is reachable when m is 
equal to 2 or 3 and n = 1. The last case is to > 4 and n = 1. 
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Theorem 6. Given an integer m > 4 7 there exists a DFA M of m states and 
a DFA N of 1 state such that any DFA accepting L(M) R L(N) needs at least 
2" 1 - 1 + 1 states. 

Proof. Let M = (Qm, S, 8m, 0, {m — 1}) be a DFA, shown in Figure [3l where 
Qm = {0, 1, • • • , m — 1}, to > 4, E = {a, b, c, d}, and the transitions are given 
as: 

• <5m(*, a) = i + 1 mod m, i = 0, . . . , m — 1, 

• <5m(*, &) = i — 0, . . . , m — 2, 5m(iti — l,b) — m — 2, 

• <5m(*, c) =7, 7 = 0, . . . , 777 — 3, (5m (777 — 2, c) = 777 — 1, 5m(t71 — 1, c) = 777 — 2, 

• <5m(0, <i) = 0, <5m(*, d) = i + 1, i = 1, . . . , m — 2, (m — 1, d) = 1. 




Figure 3: Witness DFA M of Theorem [S] showing that the upper bound in 
Theorem [5] is reachable when 777 > 4 and 77 = 1 

Let N be the DFA accepting E*. Then L{M) R L(N) = L(M) R Y,* . Now we 
design a DFA A — (Qa, E, (5^, {m — 1}, Fa) similar to the proof of Theorem[2j 
where Qa = {q \ q Q Qm}, E = {a, 6, c, d}, F A = {q \ £ g £ Qa}, and the 
transitions are defined as: 

6 A (p, e) = {j I S M {j, e) = i, i e p}, p £ Qa, e G E. 

It is easy to see that A is a DFA that accepts L(M) R . Since the transitions of 
M on letters a, b, and c are exactly the same as those of DFA M in the proof 
of Theorem [21 we can say that A is minimal and it has 2 m states, among which 
2 m ~ 1 states are final. 

Define B = (Q B , E, s B , {/s}) where / B ^ Qa, Qb = {Qa - F A ) U 

= f sa if sa i F A , 
B \ $b otherwise. 
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and for any e G S and I £ Qb, 

( 5 A (I,e) if5 A (I,e)£F A , 
S B (I,e)= { f B if5 A (I,e)eF A , 
{ 1b ifl = /u. 

DFA B is the same as A except that A's 2 m ~ 1 final states are changed into sink 
states and merged to one sink, final state, as we did in the proof of Theorem [SJ 
Clearly, B has 2 m - 2™- 1 + 1 = 2™- 1 + 1 states and L(B) = L{M) R Y>* . Next 
we show that B is a minimal DFA. 

(I) Every state I G Q B is reachable from {m — 1}. The proof is similar to 
that of Theorem [2] We consider the following four cases: 

1. 1 = 9. <5 A ({m-l},6)=/ = 0. 

2. I=f B . S A ({m-l},a m - 1 ) = I = f B . 

3. \I\ = 1. Assume that I = {i}, 1 < i < m — 1. Note that i ^ because 
all the final states in A have been merged into Jb- In this case, 5 A ({fn — 

4- 2 < |/| < m. Assume that / = 12, ■ • • , ifc}, 1 < «i < «2 < • • • < *fe < 
?n — 1, 2 < fc < m. (5a ({m — 1}, id) = 7, where 

w = abiacY^^abiac) 13 - 12 - 1 ■ ■ ■ ^(ac)^" 1 "- 1 - 1 ^"- 1 -^ . 

(II) Any two different states I and J in Qb are distinguishable. 

Since fs is the only final state in Qb, it is inequivalent to any other state. 
Thus, we consider the case when neither of I and J is Jb- 

Without loss of generality, we may assume that |/| > \J\. Let x e I — J. x is 
always greater than because all the states which include have been merged 
into Jb- Then a string d x_1 a can distinguish these two states because 

6 B (I,d x - l a) = f B , 
5 B {J,d x - l a) £ f B . 

Since all the states in B are reachable and pairwise distinguishable, B is a 
minimal DFA. Thus, any DFA accepting L(M)) R Y,* needs at least 2 m ~ x + 1 
states. □ 

After summarizing Theorem[5j Theorem[6]and Lemma[l] we obtain the state 
complexity of the combined operation L R L2 for m > 2 and n = 1. 

Theorem 7. For any integer m > 2, let L\ be an m-state DFA language and 
L2 be a 1-state DFA language. Then 2" i ~ 1 + 1 states are both sufficient and 
necessary in the worst case for a DFA to accept L R Li ■ 



11 



4 Star combined with catenation 



In this section, we investigate the state complexity of L(A)* L(B) for two DFAs 
A and B of sizes m,n > 1, respectively. We first notice that, when n = 1, 
the state complexity of L(A)* L(B) is 1 for any m > 1. This is because B 
is complete (L(B) is either or £*), and we have either L(A)* L(B) = or 
£* C L{A)*L{B) C £*. Thus, L(A)*L(B) is always accepted by a 1 state DFA. 
Next, we consider the case where A has only one final state and it is also the 
initial state. In such a case, L(A)* is also accepted by A, and hence the state 
complexity of L(A)* L(B) is equal to that of L(A)L(B). We will show that, for 
any A of size m > 1 in this form and any B of size n > 2, the state complexity of 
L(A)L(B) (also L{A)*L(B)) is m(2™ - 1) - 2™" 1 + 1 (Theorems ! and EJ), which 
is lower than the state complexity of catenation in the general case. Lastly, 
we consider the state complexity of L(A)* L(B) in the remaining case, that is 
when A has at least a final state that is not the initial state and n > 2. We 
will show that its upper bound (Theorem HOj) coincides with its lower bound 
(Theorem [TT]) , and the state complexity is 5 • 2 m+Tl " 3 - 2™- 1 - 2" + 1. 

Now, we consider the case where DFA A has only one final state and it is 
also the initial state, and first obtain the following upper bound of the state 
complexity of L(A)L(B) (L(A)*L(B)), for any DFA B of size n > 2. 

Theorem 8. For integers m > 1 and n > 2, let A and B be two DFAs with 
m and n states, respectively, where A has only one final state and it is also the 
initial state. Then, there exists a DFA of at most m(2" — 1) — 2 n_1 + 1 states 
that accepts L(A)L(B) , which is equal to L(A)*L(B). 

Proof. Let A = (Qi, S, 6i,si, {si}) and B = (Q 2 , £, 82, s 2 , i^)- We construct a 
DFA C = (Q, E, 8, s, F) such that 

Q = QiX (2 Q > - {0}) - {si} x ( 2 «»-{«> - {0}), 
■s = (si, {s 2 }), 

f = {( g ,T)6Q|TnF 2 ^0}, 

6((q,T),a) = (q',T'), for a e S, where <?' = 5i(g,a) and T' = i? U {s 2 } 
if <?' = si, T" = R otherwise, where R = 5%(T, a). 

Intuitively, Q contains the pairs whose first component is a state of Qi and 
second component is a subset of Q 2 . Since s\ is the final state of A, without 
reading any letter, we can enter the initial state of B. Thus, states (q, 0) such 
that q € Qi can never be reached in C, because B is complete. Moreover, Q 
does not contain those states whose first component is si and second component 
does not contain s 2 . 

Clearly, C has m(2" — 1) — 2™ _1 + 1 states, and we can verify that L(C) = 
L(A)L(B). □ 

Next, we show that this upper bound can be reached by some witness DFAs 
in the specific form. 
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6, c 

Figure 4: Witness DFA A for Theorem [9] when m > 2 

Theorem 9. For any integers m > 1 and n > 2, there exist a DFA A of m 
states and a DFA B of n states, where A has only one final state and it is also 
the initial state, such that any DFA accepting the language L(A)L(B), which is 
equal to L{A)*L{B), needs at least m(2 n - 1) - + 1 states. 

Proof. When m = 1, the witness DFAs used in the proof of Theorem 1 in [2"5] 
can be used to show that the upper bound proposed in Theorem [5] can be 
reached. 

Next, we consider the case when m > 2. We provide witness DFAs A and 
B, depicted in Figures 2] and [5j respectively, over the three letter alphabet 
S = {a, 6, c}. 

A is defined as A = (Q 1} £, Si, 0, {0}) where Qi = {0,1, ... ,m — 1}, and the 
transitions are given as 

• 5i(i, a) = i + 1 mod m, for i € Qi, 

• <$i(z,x) = i, for i £ Qi, where x £ {&, c}. 

-B is defined as B = (Q 2 , S, (52, 0, {n — 1}) where Q2 — {0, 1, . . . , n — 1}, 
where the transitions are given as 

• 62(1,0) = i, for i e Q2, 

• 62(1, b) — i + 1 mod n, for i £ Q2, 

• ^2(0, c) = 0, 82(1, c) = i + 1 mod n, for i £ {1, 1}. 

Following the construction described in the proof of Theorem[Sl we construct 
a DFA C = (Q,E,6,s,F) that accepts L(A)L(B) (also L(A)*L(B)). To prove 
that C is minimal, we show that (I) all the states in Q are reachable from s, 
and (II) any two different states in Q are not equivalent. 
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a 




For (I), we show that all the state in Q are reachable by induction on the 
size of T. 

The basis clearly holds, since, for any i £ Qi, state (i, {0}) is reachable from 
(0, {0}) by reading string a 1 , and state (i, {j}) can be reached from state (i, {0}) 
on string V , for any i £ {1, . . . ,m — 1} and j £ Q2- 

In the induction steps, we assume that all the states (q, T) such that |T| < k 
are reachable. Then, we consider the states (q,T) where \T\ = k. Let T = 
{ji,j2, ■ ■ ■ ,jk} such that < ji < 32 < ■ ■ ■ < jk < n — 1. Wc consider the 
following three cases: 

1. ji = and j'2 = 1. For any state i £ Q±, state (i,T) £ Q can be reached 
as 

(i, {0, l,j 3 , . . . , 3 k}) = 5((0, {0, is - 1, ■ ■ • ,jk ~ 1}), ba% 
where {0, js — 1, . . . , jk — 1} is of size — 1. 

2. ji = and j% > 1. For any state i G Qi, state (i, {0, 32, ■ ■ ■ 7 jk}) can be 
reached from state (i, {0, 1, j'3 — 32 + 1, • • • , jk — 32 + 1}} by reading string 

3. ji > 0. In such a case, the first component of state (q,T) cannot be 0. 
Thus, for any state i £ {1, . . . , m — 1}, state (i, {31,32, ■ ■ ■ ,3k}) can be 
reached from state (j, {0, 32 — 31 , ■ ■ ■ , jk ~ ji }) by reading string b n . 

Next, we show that any two distinct states (q,T) and (q',T') in Q are not 
equivalent. We consider the following two cases: 

1. q ^ q'. Without loss of generality, we assume q ^ 0. Then, string w — 
c n ~ 1 a m ~ q b n can distinguish the two states, since 5((q,T),w) £ F and 
5((q',T'),w)?F. 
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2. q = q' and T ^ T' . Without loss of generality, we assume that \T\ > \T'\. 
Then, there exists a state j ET — T' . It is clear that, when q ^ 0, string 
b™- 1 -.? can distinguish the two states, and when q = 0, string c™^ 1- - 7 can 
distinguish the two states since j cannot be 0. 

Due to (I) and (II), DFA C needs at least m(2" - 1) - 2™- 1 + 1 states and 
is minimal. □ 

In the rest of this section, we focus on the case where DFA A contains at 
least one final state that is not the initial state. Thus, this DFA is of size at 
least 2. We first obtain the following upper bound for the state complexity. 

Theorem 10. Let A = (Qi, E, 6i, Si, Pi) be a DFA such that \Qi\ = m > 1 

and \Fi - {si}\ = fci > 1, and B = (Q 2 ,T,,5 2 ,s 2 ,F 2 ) be a DFA such that 

3 

\Q 2 \ =n>l. Then, there exists a DFA of at most (-2 m - 1)(2" - 1) - (2™- 1 - 
2 m-fc 1 -i)( 2 "- 1 - 1) states that accepts L{A)*L{B). 

Proof. Wc denote Fi - { Sl } by F . Then, \F \ = fci > 1. 

We construct a DFA C = {Q, S, <5, s, F} for the language L\L 2 , where L\ 
and L 2 are the languages accepted by DFAs A and B, respectively. 

Let Q = {(p, t) | p e P and t e T} - {{p', t') \ p' G P' and t' G T'}, where 

P = {R | R C (Qi -F Q ) and 0} U {i? | R C Q 1 ,s 1 G R, and i? n F ^ 0}, 

T = 2 Q2 - {0}, 

P' = {R \ RC Qi, si G P, and RCiF ^ 0}, 

T' = 2 Q2_{S2} - {0}. 

The initial state s is s = ({si}, {s2}}- 

The set of final states is defined to be F = {(p, t) G Q \ t n P 2 7^ 0}- 
The transition relation 5 is defined as follows: 



5({p,t),a) 



(p',t') ifp'nP 1 =0, 
(p', t' U {S2}} otherwise, 



where, a G S, p' = <5i(p, a), and t' = 5 2 (t, a). 

Intuitively, C is equivalent to the NFA C obtained by first constructing 
an NFA A' that accepts L\, then catenating this new NFA with DFA B by A- 
transitions. Note that, in the construction of A', we need to add a new initial and 
final state s[ . However, this new state does not appear in the first component of 
any of the states in Q. The reason is as follows. First, note that this new state 
does not have any incoming transitions. Thus, from the initial state s[ of A' , 
after reading a nonempty word, we will never return to this state. As a result, 
states (p, t) such that p C Qi\j{s' l }, s[ G p, and t G 2^ 2 is never reached in DFA 
C except for the state ({s[}, {s 2 }). Then, we note that, in the construction of 
A', states s[ and s\ should reach the same state on any letter in S. Thus, we 
can say that states ({s^}, {S2}) and ({si}, {s 2 }} are equivalent, because either 
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of them is final if S2 F2 , and they are both final states otherwise. Hence, we 
merge this two states and let ({si}, {52}) be the initial state of C. 

Also, we notice that states (p, 0) such that peP can never be reached in 
C, because B is complete. 

Moreover, C does not contain those states whose first component contains a 
final state of A and whose second component does not contain the initial state 
of B. 

Therefore, we can verify that DFA C indeed accepts L\L2 : and it is clear 
that the size of Q is 

(^2 m - 1)(2™ - 1) - (2™" 1 - 2 m ~' £l - 1 )(2 n - 1 - 1). 

□ 

Then, we show that this upper bound is reachable by some witness DFAs. 




Figure 6: Witness DFA A for Theorem [TT] 



Theorem 11. For any integers m 1 n>2, there exist a DFA A of m states and 
a DFA B of n states such that any DFA accepting L(A)* L(B) needs at least 
5 • 2 m+n - 3 - 2™- 1 - 2 n + 1 states. 

Proof. We define the following two automata over a four letter alphabet E = 
{a, 6, c, d}. 

Let A = (Qi, E, Si, 0, {to— 1}), shown in Figure|6l where Q\ = {0, 1, . . . , to — 
1}, and the transitions are defined as 

• Si(i, a) = i + 1 mod to, for i 6 Qi, 

• <5i (0, b) = 0, Si(i, b) = i + 1 mod to, for i e {1, . . . , to — 1}, 

• Si(i, x) = i, for ieQi.se {c, d}. 
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a,b a,b 




a, b 



Figure 7: Witness DFA B for Theorem [TT] 

Let B = (Q 2 , E, 82, 0, {n — 1}), shown in Figure[3 where Q% = {0, 1, . . . , n — 
1}, and the transitions are defined as 

• 82(1, x) = i, for i 6 Q 2 , 2; 6 {a, 6}, 

• (?2 («, c) = i + 1 mod n, for i e Q 2 j 

• S 2 (i,d) = 0, for i e Q 2 . 

Let C = {Q, E, J, ({0}, {0}), F} be the DFA accepting the language L(A)*L(B) 
which is constructed from A and B exactly as described in the proof of Theo- 
rem [lOj 

Now, we prove that the size of Q is minimal by showing that (I) any state 
in Q can be reached from the initial state, and (II) no two different states in Q 
are equivalent. 

We first prove (I) by induction on the size of the second component t of the 
states in Q. 

Basis: for any i S Q2, state ({0}, {i}) can be reached from the initial state 
({0}, {0}) on string c\ Then, by the proof of Theorem 5 in [23 , it is clear that 
state (p, {i}) of Q, where p G P and i G Q2, is reachable from state ({0}, {i}) 
on strings over letters a and b. 

Induction step: assume that all the states (p, t) in Q such that p£P and 
\t\ < k are reachable. Then, we consider the states (p, t) in Q where p G P and 
\t\ = k. Let t = {ji, 32, . . . , jk} such that < ji < j'2 < . . . < jk < n — 1. 

Note that states such that p = {0} and j\ — are reachable as follows: 

({0}, {0, j 2 , . . . ,j k }) = 8(({0}, {0,j 3 ~ 32, ...Jk- j 2 }),^a m - 1 b). 
Then, states such that p = {0} and j\ > can be reached as follows: 
({0}, {h,j 2 , . . ., Jk }) - 8(({0}, {0,32-31, ■ ■ .Jk-Ji}),^). 
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Once again, by using the proof of Theorem 5 in [23], states (p, t) in Q, where 
p E P and \t\ = k, can be reached from the state ({0},t) on strings over letters 
a and b. 

Next, we show that any two states in Q are not equivalent. Let (p, t) and 
(p',t') be two different states in Q. We consider the following two cases: 

1. p 7^ p'. Without loss of generality, we assume \p\ > \p'\. Then, there 
exists a state i E p — p'- It is clear that string a m ~ 1 ~ l dc n is accepted by C 
starting from state (p, t), but it is not accepted starting from state {p',t'). 

2. p — p' and t / t'. We may assume that \t\ > \t'\ and let j £ t — t'. Then, 
state (p, t) reaches a final state on string c 11 " 1 "^, but state (p' , t') does not 
on the same string. Note that, when m — 1 € p, we can say that j ^ 0. 

Due to (I) and (II), DFA C has at least 5 • 2 m +"- 3 - 2™- 1 - 2" + 1 reachable 
states, and any two of them are not equivalent. □ 

5 Conclusion 

In this paper, we have studied the state complexities of two combined operations: 
reversal combined with catenation and star combined with catenation. We 
showed that, due to the structural properties of DFAs obtained from reversal 
and star, the state complexities of these two combined operations are not equal 
but close to the mathematical compositions of the state complexities of their 
individual participating operations. 
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